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Abstract — In distributed storage systems that use coding, 
the issue of minimizing the communication required to 
rebuild a storage node after a failure arises. We consider 
the problem of repairing an erased node in a distributed 
storage system that uses an EVENODD code. EVENODD 
codes are maximum distance separable (MDS) array codes 
that are used to protect against erasures, and only require 
XOR operations for encoding and decoding. We show that 
when there are two redundancy nodes, to rebuild one erased 
systematic node, only 3/4 of the information needs to be 
transmitted. Interestingly, in many cases, the required disk 
I/O is also minimized. 

I. Introduction 

Coding techniques for storage systems have been used 
widely to protect data against errors or erasure for CDs, 
DVDs, Blu-ray Discs, and SSDs. Assume the data in 
a storage system is divided into packets of equal sizes. 
An (n, k) block code takes k information packets and 
encodes them into a total of n packets of the same 
size. Among coding schemes, maximum distance sep- 
arable (MDS) codes offer maximal reliability for a given 
redundancy: any k packets are sufficient to retrieve all 
the information. Reed-Solomon codes [1| are the most 
well known MDS codes that are used widely in storage 
and communication applications. Another class of MDS 
codes are MDS array codes, for example EVENODD |2| 
and its extension Q, B-code 0), X-code Q, RDP (6), 
and STAR code [7|. In an array code, each of the packets 
consists of a column of elements (one or more binary 
bits), and the parities are computed by XORing some 
information bits. These codes have the advantage of low 
computational complexity over RS codes because the 
encoding and decoding only involve XOR operations. 

Distributed storage systems involving storage nodes 
connected over networks have recently attracted a lot of 
attention. MDS codes can be used for erasure protection 
in distributed storage systems where encoded information 
is stored in a distributed manner. If no more than n — k 
storage nodes are lost, then all the information can still 
be recovered from the surviving packets. Suppose one 
packet is erased, and instead of retrieving the entire 
k packets of information, if we are only interested in 
repairing the lost packet, then what is smallest amount 
of transmission needed (called the repair bandwidth) 1 ? If 



we transmit k packets from the other nodes to the erased 
one, then by the MDS property, we can certainly repair 
this node. But can we transmit less than k packets? More 
generally, if no more than n — k nodes are erased, what 
is the repair bandwidth? This repair problem was first 
raised in (8), and was further studied in several works 
(e.g. ll9"l- |[T4l ). A recent survey of this problem can be 
found in 1151 . In (8), a cut-set lower bound for repair 
bandwidth is derived and in [11|[12||13|, this lower 
bound is matched for exact repair by code constructions 
for k — 2, 3, n—1 and 2k < n. All of these constructions 
however require large finite fields. Very recently it was 
established that the cut-set bound of OD is achievable for 
all values of k and n, lfT3llfT4l . However, the proof is 
theoretical and is based on very large finite fields. Hence, 
it does not provide the basis for constructing practical 
codes with small finite fields and high rate. 

In this paper we take a different route: rather than try- 
ing to construct MDS codes that are easily repairable, we 
try to find ways to repair existing codes and specifically 
focus on the families of MDS array codes. A related and 
independent work can be found in [ 16 1, where single-disk 
recovery for RDP code was studied, and the recovery 
method and repair bandwidth is indeed similar to our 
result. Besides, [16| discussed balancing disk I/O reads 
in the recovery. Our work discusses the recovery of single 
or double disk recovery for EVENODD, X-code, STAR, 
and RDP code. 

If the whole data object stored has size M bits, 
repairing a single erasure naively would require com- 
municating (and reading) M bits from surviving storage 
nodes. Here we show that a single failed systematic node 
can be rebuilt after communicating only |A/ + 0(M 1 / 2 ) 
bits. Note that the cut-set lower bound [8] scales like 
\M + 0{M 1 / 2 ), so it remains open if the repair com- 
munication for EVENODD codes can be further reduced. 
Interestingly our repair scheme also requires significantly 
less disk I/O reads compared to naively reading the whole 
data object. 

The rest of this paper is organized as follows. In 
Section Ull we are going to define EVENODD code and 
the repair problem. Then the repair of one lost node is 
presented in Section [ill] for EVENODD (k = n- 2) and 



in Section HVl for the extended EVENODD (k<n- 2). 
In SectionlVl we consider the case with two erased nodes 
and k — n — 3. At last, conclusion is made in Section 
El 

II. Definitions 

An Rx n array code contains R rows and n columns 
(or packets). Each element in the array can be a single 
bit or a block of bits. We are going to call an element 
a block. In an (n, k) array code, k information columns, 
or systematic columns, are encoded into n columns. The 
total amount of information is M — Rk blocks. 

An EVENODD code is a binary MDS array code 
that can correct up to 2 column erasures. For a prime 
number p > 3, the code contains R = p — 1 rows and 
n = p + 2 columns, where the first k = p columns 
are information and the last two are parity. And the 
information is M = (p — l)p blocks. 

We will write an EVENODD code as: 

ai,i ai,2 ■ ■ ■ ai.p bi.Q b\^\ 

0.2,1 02,2 ■ ■ ■ 02,p 62,0 02,1 
°p-l,l Cbp-1,2 ■ ■ ■ %>-l,p bp-1,0 

And we define an imaginary row a p j — 0, for all j — 
1,2, ... ,p, where is a block of zeros. The slope or 
horizontal parity is defined as 

p 

h,o = ^ (1) 

for i = 1, . . . ,p— 1. The addition here is bit-by-bit XOR 
for two blocks. A parity block of slope v, —p < v < p 
and v is defined as 

p p 

bi >v = ^ t a j,<i+v(l-j)> + Sy — y t a <i+v(l-j)>,j + Sy 
3=1 3=1 

(2) 

where S v = a p ,\ + a p _„ :2 + • • ■ + a <p+v>:P = 

YJj=i a <v(i-j)>.j and < x >= [% - 1) mod P +!• 
Sometimes we omit the "<>" notation. When v = 1, 
we call it the slope 1, or diagonal parity. In EVENODD, 
parity columns are of slopes and 1. 

A similar code is RDP |6|, where R = p—1, n = p+l, 
and k = p—1, for a prime number p. The diagonal parity 
sums up both the corresponding information blocks and 
one horizontal parity block. Another related code is X- 
code Q, where the parity blocks are of slope -1 and 1, 
and are placed as two additional rows, instead of two 
parity columns. 

The code in extended EVENODD to more than 
2 columns of parity. This code has n = p + r, k = p, 
and R = p—1. The information columns are the same as 
EVENODD, but r parity columns of slopes 0, 1, ... , r—1 
are used. It is shown in [3 1 that such a code is MDS when 
r < 3 and conditions for a code to be MDS are derived 
for r < 8. 



STAR code Q is an MDS array code with k = p, R — 
p— l,n=p + 3, and the parity columns are of slope 0, 
1, and -1. 

A parity group Bi :V of slope v contains a parity block 
bi, v and the information blocks in the sum in equations 
(T} (0, i = 1, 2, . . . ,p — 1. S v is considered as a single 
information block. If v = 0, it is a horizontal parity 
group, and if v = 1, we call it a diagonal parity group. 

By ([]]), each horizontal parity group B i Q contains 
a t ,<k+i-i> G B kA , for all k = 1,2, . . . ,p - 1. So we 
say Bi crosses with B^ i, for all k = 1, 2, . . . ,p — 1. 
Conversely, each diagonal parity group B^x contains 
ak,<i+i-k> G B kfi , for all k = 1, 2, . . . ,p-l. Therefore, 
Bi t i crosses with Bkfl for all k = 1, 2, . . . ,p — 1. The 
shared block of two parity groups is called the crossing. 
Generally, two parity groups B^ v and Bk, u cross, for 
d/u, 1 < i, k < p — 1. If they cross at a Pi <i+ t) > = 0, 
we call it a zero crossing. A zero crossing does not really 
exist since the p-th row is imaginary. A zero crossing 
occurs if and only if 

u, v ^ and < i + v >=< k + u > (3) 

Moreover, each information block belongs to only one 
parity group of slope v. 

Suppose the n packets are stored in n different nodes 
in a connected network. Each storage node contains 
exactly one packet (or one column). Assume n — d 
nodes are erased, d > k. Suppose we recover the nodes 
successively. For any specified erased node, how many 
blocks from the other storage nodes are needed to recover 
it? We can either send data in a single block, or a 
linear combination of several blocks in one node, both 
of which are counted as one block of transmission. 
The total number of blocks transmitted to recover the 
specified node is called the repair bandwidth 7. The 
repair problem for distributed storage system asks what 
the smallest 7 is, for fixed M, d, k. In flfQ, a cut-set lower 
bound is derived (and is achieved only when each node 
transmits the same number of blocks): 



' k(d-k + l) w 

In this paper, we use MDS array codes as distributed 
storage codes. We will give repair methods and compute 
the corresponding bandwidth 7. 

Example 1. Consider the EVENODD code with p = 3. 
Set ai 3 = 02.3 = for all codewords, then the code 
will contain only 2 columns of information. The resulting 
code is a (4, 2) MDS code and this is called shortened 
EVENODD (see Figure Q]). It can be verified that if any 
node is erased, then sending 1 block from each of the other 
nodes is sufficient to recover it. And this actually matches 
the bound ©. Figure\\\ shows how to recover the first or 
the fourth column. Notice that a sum block is sent in some 
cases. For instance, to recover the first column, the sum 
bi.i + fo 2 ,i is sent from the fourth column. 
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at most once. There are x(p — 1 — a;) crossings. The total 
number of blocks sent is 



: ^10 + (^11 +*2l) 
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Fig. 1. Repair of a (4, 2) EVENODD code if the first column (top 
graph) or the fourth column (bottom graph) is erased. In both cases, 
three blocks are transmitted. 

In this paper, shortening of a code is not considered 
and we will focus on the recovery of systematic nodes, 
given that 1 or 2 systematic nodes are erased. And we 
send no linear combinations of data except the sum 
Sf=i bi yV from the parity node of slope v, for all v 
defined in an array code. In addition, we assume that 
each node can transmit a different number of blocks. 

III. Repair for Codes with 2 Parity Nodes 

First, let us consider the repair problem of losing 
one systematic node, n — d = 1, and n — k = 2. We 
will use EVENODD to explain the repair method, and 
the recovery will be very similar if RDP or X-code is 
considered. 

By the symmetry of the code, we assume that the 
first column is missing. Each block in the first column 
must be recovered through either the horizontal or the 
diagonal parity group including this block. Suppose we 
use x horizontal parity groups and p — 1 — x diagonal 
parity groups to recover the column, < x < p — 1. 
These parity groups include all blocks of the first column 
exactly once. 

Notice that Si — YTiZx bi,o + Ym=x ^*.i> so we can 
send Y%Zi bi,o from the (p + l)-th node, and Y^=x h,X 
from the (p + 2)-th node, and recover Sx with 2 blocks 
of transmission. For the discussion below, assume Sx is 
known. 

For each horizontal parity group Bix>, we send 
and aij, j = 2, 3, . . . , p. So we need p blocks. For each 
diagonal parity group B^x, as S\ is known, we send 6f i 
and a :)i <i+i_ :( >, j = 1, 2, . . . , i — 1, i + 1, . . . ,p — 1, 
which is p — 1 blocks in total. 

If two parity groups cross at one block, there is no 
need to send this block twice. As shown in Section [II] 
any horizontal and any diagonal parity group cross at a 
block, and each block can be the crossing of two groups 



x(p 



x) 



7 = xp + (p — 1 — x)(p — 1) + ^_2_ 

horizontal diagonal crossings 

= (p-l)p + 2-(x+l)(p-l-x) (5) 
> (p - l)p + 2 - (p 2 - l)/4 = (3p 2 -4p + 9)/4 

The equality holds when x = (p—l)/2orx = (p — 3)/2, 
where x is an integer. 

This result states that we only need to send about 3/4 
of the total amount of information. And the slopes of 
the n chosen parity groups do not matter as long as half 
are horizontal and half are diagonal. Moreover, similar 
repair bandwidth can be achieved using RDP or X-code. 
For RDP code, the repair bandwidth is 

3(P-1) 2 
4 

which was also derived independently in ffflj . For X- 
code, the repair bandwidth is at most 

3p 2 - 2p + 5 
4 

The derivation for RDP is the following. For RDP 
code, the first p — 1 columns are information. The p-th 
column is the horizontal parity. The (p+ l)-th column is 
the slope 1 diagonal parity (including the p-th column). 
The diagonal starting at a p .i =0 is not included in any 
diagonal parities. Suppose the first column is erased. 
Each horizontal or diagonal parity group will require 
p — 1 blocks of transmission. Every horizontal parity 
group crosses with every diagonal parity group. Suppose 
(p— 1)/2 horizontal parity groups and (p— 1)/2 diagonal 
parity groups are transmitted. Then the total transmission 
is 



7 = (P-1)(P-1)- 



p— 1 parity groups 



p-lp-l _ 3(p-l) : 



This result is also derived independently in [16]. 

The derivation for X-code is as follows. For X-code, 
the (p — l)-th row is the parity of slope -1, excluding 
the p-th row. And the p-th row is the parity of slope 1, 
excluding the (p — l)-th row. Suppose the first column 
is erased. First notice that for each parity group, p — 2 
blocks need to be transmitted. To recover the parity block 
o»_i i, one has to transmit the slope -1 parity group 
starting at a v -x t \. To recover the parity block a p> x, the 
slope 1 parity group starting at a p> x must be transmitted. 
But it should be noted that by the construction of X- 
code, this slope 1 parity group essentially is the diagonal 
starting at a p -x,x, except for the first element a Pt i. Zero 
crossings happen between two parity groups of slopes -1 
and 1, starting at a,.i and Oj,i, if 

< i + j >= p — 2 or < i + j >= p 



Each slope 1 parity group has no more than 2 zero 
crossings with the slope -1 parity groups. 

Suppose we choose arbitrarily (p — l)/2 slope 1 
parity groups and (p — 3)/2 slope -1 parity groups for 
the information blocks in the first column. Then not 
considering the parity group containing a Pt \, the number 
of slope 1 and slope -1 parity groups are both (p— l)/2. 
Excluding zero crossings, each slope 1 parity group 
crosses with at least 

(p - l)/2 - 2 = (p - 5)/2 

slope -1 parity groups. The total transmission is 

p-lp-5 3p 2 -2p + 5 

p parity groups 

Also, equation (0 is optimal in some conditions: 

Theorem 2. The transmission bandwidth in © is optimal 
to recover a systematic node for EVENODD if no linear 
combinations are sent except Y%=i ^i,v> forv = 0, 1. 

Proof: To recover a systematic node, say, the first 
node, parity blocks bi <v , i — 1,2, ...,p — 1 must be 
sent, where v can be or 1 for each i. This is because 
cii.i is only included in b{ t o or b^x- Besides, given bi V , 
the whole parity group B iv must be sent to recover 
the lost block. Therefore, our strategy of choosing x 
horizontal parity groups and p — 1 — x diagonal parity 
groups has the most efficient transmission. Finally, since 
(0 is minimized over all possible x, it is optimal. ■ 
The lower bound by (0| is 



Systematic Nodes 



Parity Nodes 



Md 



M(n - 1) p{p - l)(p + 1) p 2 - 1 



(d-k + l)k [n-k)k 



2p 



where d = n — 1, n = p + 2, k = p, and M = p(p — 1). 
It should be noted that (01 assumes that each node sends 
the same number of blocks, but our method does not. 



Example 3. Consider the EVENODD code with p = 5 
in Figure [2] For 1 < i < 4, the code has information 
blocks ciij, 1 < j < 5, and parity blocks bi tV , v = 0,1. 
Suppose the first column is lost. Then by ©, we can 
choose parity groups Bi : q, -82,0, -63,1, -64,1- The blocks 
sent are: YhZx h,o, Z)i=i ki> & i,o, &2,o, &3,i, &4,i from 
the parity nodes and ax, 2, ax,3, ai.4, ax,5, 02,2, 02,3, 02,4, 
0-2,5, 04,5, a-$ : 2 from the systematic nodes. Altogether, we 
send 16 blocks, the number specified by ©. We can 
see that 0^3 is the crossing of Bi and -B3.1. Similarly, 
a i,4; a 2,2j «2.3 are crossings and are only sent once for two 
parity groups. 




Fig. 2. Repair of an EVENODD code with p = 5. The first column 
is erased, shown in the box. 14 blocks are transmitted, shown by the 
blocks on the horizontal or diagonal lines. Each line (with wrap around) 
is a parity group. 2 blocks in summation form, X^=i ^i,0, 
are also needed but are not shown in the graph. 

IV. r Parity Nodes and One Erased Node 

Next we discuss the repair of array codes with r 
columns of parity, r > 3. And we consider the recovery 
in the case of one missing systematic column. In this 
section, we are going to use the extended EVENODD 
code [3|, i.e. codes with parity columns of slopes 
0, 1, . . . , r — 1. Similar results can be derived for STAR 
code. Suppose the first column is erased without loss of 
generality. 

Let us first assume r = 3, so the parity columns 
have slopes 0, 1, 2. The repair strategy is: sending parity 
groups Bj, n+V _ v for v = 0, 1, 2 and 1 < 3ri + v < p — 1. 

Let A = [(p - 1)/3J. Notice that < n < A and 
each slope has no more than \(p — l)/3] but no less 
than [(p — 1)/3J = A parity groups. 

Since there are three different slopes, there are cross- 
ings between slope and 1, slope 1 and 2, and slope 
2 and 0. For any two parity groups and B^ i, 
< k — i >y^ 1, so (f3]l does not hold. Hence no zero 
crossing exists for the chosen parity groups. Hence, 
every crossing corresponds to one block of saving in 
transmission. However, the total number of crossings is 
not equal to the sum of crossings between every two 
parity groups with different slopes. Three parity groups 
with slopes 0, 1, and 2 may share a common block, which 
should be subtracted from the sum. 

Notice that the parity group Bi >t) contains the block 
The modulo function "<>" is omitted in 



a 



i—vy,y+l- 

the subscripts. For three transmitted parity groups 
B3n,o, B3 m +x,x, B 3l+2 ,2, if there is a common block 
in column y + 1, then it is in row 3n = 3m + 1 — 
y = 31 + 2 — 2y (mod p). To solve this, we get 
y = 3(m — n) + 1 = 3(7 — m) + 1 (mod p), or 
to — n = I — m (mod p). Notice < n,m,l < p/3, so 
— p/3 < to — n, I — to < p/3. Therefore, to — n = I — to 
without modulo p. Thus I — n must be an even number. 
For fixed n, either n < m < I < A, and there are 
no more than (A — n)/2 + 1 solutions for (to,/); or 
< / < to < n, and the number of (to, I) is no more 
than n/2. Hence, the number of (n, to, I) is no more than 
Eti(04 - n)/2 + 1 + n/2) = ^ 2 /2 + A. 

The total number of blocks in the p — 1 chosen parity 
groups is less than p(p — 1). There are no less than A 
parity groups of slope v, for all < v < 2, therefore 



for < u < v < 2, parity groups with slopes u and v 
have no less than A 2 crossings. Hence the total number 
of blocks sent in order to recover one column is: 



7 < Pip ~ !) 



p— 1 parity groups 



< 



13 



17 



47 



A 2 



A 2 + 2A 



18 P + Y^18 



(6) 



where (p — 4)/3 < A < (p— l)/3. The above estimation 
is an upper bound because there may be better ways to 
assign the slopes of each parity group. Thus, we need to 
send no more than 13M /18 blocks if r = 3. 

By abuse of notation, we write B mv = 
{ a < m +v(i- 3 )>., 3 ■ 3 = 2,...,p} as the set 
of blocks (including the imaginary p-th row) 
in the parity group except S v and a m i. Let 
M v C {1,2,..., q - 1}, < v < r - 1, be 
disjoint sets such that U^zjM„ = {1, 2, . . . , q — 1}. Let 
Bm v ,v = ^meM v B m ^ v . For given M v , define a function 
/ as f(vi,v 2 ,...,v k ) = \{mi g M Vll ...,m k g M Vk : 
(m 2 - mi)/(v 2 - vi) ee (m 3 - m 2 )/(v 3 - v 2 ) = 
. . . (rrik — rrik-i) / (vk — Wfe-i) mod p}\, for k > 3, and 
< vi < v 2 < ■ ■ ■ < Vk < r — 1. Then we have the 
following theorem: 

Theorem 4. For the extended EVENODD with r > 3, the 
repair bandwidth for one erased systematic node is 



7 < P(P ~ 1) 



P 



0<^>l<^;2<'U3<r■— 1 
\r-l 



O^^i <V2 <r — 1 
/(«1,U2,U3) - 



+(-ir- A /(0,l,...,r-l) 



(7) 



Proof: Suppose the first column is missing and 
we transmit the parity groups B m ^ v , m g M„ for 
v = 0, 1, . . . , r — 1. Since the union of M v covers 
{1, 2, . . . , q— 1}, all the blocks in the first column can be 
recovered. The repair bandwidth is the cardinality of the 
union of Bm^,v plus the number of zero crossings and 
the summation blocks Y^=i The number of zero 
crossings is no more than the size of the imaginary row, 
p. The number of the summation blocks is r. 

By inclusion-exclusion principle, the cardinality of the 
union of Bm v is 



\B 



0<v<r-l 



M„,v\ 



£ 



0<vi <t>2 <r- 



\Bm v1 ,v! n Bm„ 3 ,o 2 | 



LB 



0<i>i <^2 < V3<r—1 

— ■ ■ ■ + (— i) ,_1 |-Bm ,o n Bm ± ,i ■ 



B 



Af r -i,r-l| 



< P' so E <Kr-l \ B M v ,v\ < P{P~ !)■ 

cross at a block. 
Af„ 2 ,«2| = |iWf Vl ||Af U3 |. Since 

Bm v 

contains a <m + v n-j) >t j, j = 2,...,p, the intersection 



Every \B n 
Every two parity groups B muVl , B m2 ^ V2 
Hence \B Mvi , Vl n B A 



of more than two parity groups B mijVl , . . . , B mktVk is 
equivalent to the solutions of 

mi - viy = m 2 - v 2 y = ■ ■ ■ = m k - v k y mod p 

where y + 1 is the column index of the intersection. Or, 



y 



m 2 — mi 
v 2 - Vi 



Vk - Vk-l 



mod p 



Therefore, 

\Bm vi ,«i n Sm„ 8 ,v 2 n . . . 5/14^ ,«* | = f(vi,V2, ...,Vk) 

And © follows. ■ 
We can see that © is a special case of (0, with M v = 

{3n + w : 1 < 3n + v < p - 1}, for u = 0, 1, 2. For 
r = 4, 5, we can derive similar bounds by defining M v . 
Choose 



M v = {rn + v : I < rn + v < p — 1} 



(8) 



for d = 0, 1, . . . , r - 1. Let A = |_(p - 1) /rj . And for 
< «i < «a < U3 < r — 1, f(vi,V2,v3) becomes the 
number of (m, U2, n.3), 1 < rrii + w,: < p — 1, such that 

(n 2 - m)(v3 - v 2 ) ee (n 3 — n 2 )(v 2 - vi) mod p 

Since — p/r < n 2 — m, na — n 2 < p/r, and (v^ — v 2 ) + 
(v 2 —Vx) < r, the above equation becomes 

(n 2 - n%)iv 3 - v 2 ) = (n 3 - n 2 )(v 2 - v%) 

without modulo p. Therefore, 



n 3 -m = (n 3 - n 2 ) + (n 2 - n x ) 
c • lcm(u 3 - v 2 ,V2 — vi) 



1 



«3 — «2 "2 — Ul 



u 3 - vi 



gcd(w 3 - U2,W2 - Vi) 

where c is an integer constant, 1cm is the least common 
multiplier and gcd is the greatest common divisor. And 
for fixed n\, the number of solutions for (n 2 , n 3 ) is no 
more than 1 + (A - ni)gcd(w 3 - v 2 ,v 2 - vi)/(v 3 - vi), 
when Tii < n 2 < n 3 < A; and no more than nigcd(w 3 — 
v 2l v 2 — Vi)/(v3 — Vi), when < n 3 < n 2 < n\. The 
number of (ni,n 2 , n 3 ) is 

ft \^STi , (a , ^ gcd(f 3 -v 2l v 2 -vi) 
J(vi,v 2 , v 3 j < } 1 + (A - ni + ni) 

jcd(v 3 - v 2 ,v 2 - vi) 
v 3 - Vi 



~ A + A 
Similarly, for four parity groups, 
fivi,v 2 ,v 3 ,v 4 ) > A ^1 + iA + 2) 
For five parity groups, 



cd(f 4 - v 3 ,v 3 - v 2 ,v 2 - vi) 
v A - Vi 



fivi,v 2 ,v 3 ,V4,v 5 ) < A+A 



2 gcd(v 5 - u 4 ,w 4 - v 3 ,v 3 - v 2l v 2 - v\) 



V 5 - Vi 



When r = A, equation (0 becomes 

7 < p(p-l)+p + 4- J! l M -ill M -2 

0<Dl<«2<3 

+ 5Z /K«a,wa)-/(0,1,2,3) 



0<'Ui<U2<'U3<3 

By the previous equations, 

/(0,1,2),/(1,2,3)< A(l + A/2) 
/(0,l,3),/(0,2,3)< A(l + A/3) 
/(0,l,2,3)>A(l + (A + 2)/3 



And the repair bandwidth is 



3"4' 



3 V 4' 



where the terms of lower orders are omitted. 
When r = 5, we can use (|7]i again and get 



7«p 2 + (- 
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where the terms of lower orders are omitted. 

It should be noted that the number of common blocks 
affects the bandwidth a lot. If we consider only the first 4 
terms in (0, any assignment of M v with equal sizes will 
result in a lower bound of 7 > (r + l)p 2 / (2r) f» p 2 /2, 
when r is large. But due to the common blocks, the true 7 
values for r = 4, 5 using (© has only slight improvement 
compared to the case of r = 3. 

The lower bound © is k{d M £ +1) = Hl£zMg±I^ ~ 
p(p+r-i) ^ ^hen r _ 3^ this bound is about p 2 /3. 

V. 3 Parity Nodes and 2 Erased Nodes 

Up to now, we have considered the recovery problem 
given that one column is erased. Next, let us assume 
that two information columns are erased and we need to 
recover them successively. So we first recover one of the 
erased nodes, and then the other one. The first recovery 
is discussed in this section, and the second recovery was 
already discussed in the previous sections. Suppose we 
have 3 columns of parity with slopes -1,0, and 1, which 
is in fact the STAR code in Q. Again, the arguments 
can be applied to extended EVENODD in a similar way. 
Without loss of generality, assume the first and (x + l)-th 
columns are missing, 1 < x < p — 1. 

Let Bixt,Bi t i, and be i-th parity group of slopes 

0, 1, and -1, respectively, i — 1,2, ...,p — 1. The 
following are 3(p— 1)/2 parity groups that repair the first 
column: -B 0i _i, B Xi0 , B 2x ,i, B 2x ,-i, B 3x , , B± Xt i, . . . , 
B(p-. 3 y x -i, B {p _ 2)X:0 , B (p _ 1)X:1 . For each parity block 
above, the corresponding recovered blocks are: a Xj i+ x , 

Gx,l) 0>2x,l, 0-3x,l+x, 0>3x,l, 0>4x,li ■ ■ • 1 a (p—2)x,l+xi 

a,(v-2)x,i> a (p-i)x,i- An example of p — 5,x = 1 is 
shown in Figure [3] 

Rearrange the columns in the following order: 
Columns 1, 1 + x, 1 + 2x, . . . , 1 + (p — l)x (every index 



is computed modulo p). We can see that the chosen 
parity groups Bj X fi, j = x, 3x, . . . , (p — 2)x contain the 
blocks in Rows Z = {x, 3x, . . . , (p — 2)x}. Bj X ,i con- 
tains blocks aj X ,i, ■ ■ ■ i a (j-p+i)x,i+(p-i)xi 
for j = 2, 4, ...,p — 1. And similarly Bj X< _i con- 
tains blocks ajx,i,ci(j+i)x,i+x> • ■ ■ i a (j+p-i)x,i+(p-i)xi 
for j =0,2,...,p-3. 

Now notice that the blocks included in the above 
parity groups have the (1 + x)-th column as the vertical 
symmetry axis. That is, the row indices of the blocks 
needed in Columns 1 and 1 + 2x are the same; those of 
Columns 1 + (p— l)x and 1 + 3x are the same; ...; those 
of Columns 1 + (p + 3)x/2 and 1 + (p + l)x/2 are the 
same. For example, the second column in Figure [3] is the 
symmetry axis. Thus, we only need to consider Columns 
1 + 2x, 1 + 3a;, . . . , 1 + (p + l)x/2. 

For columns 1 + ix, where i is even and 2 < i < (p + 



l)/2, parity groups {B 2x ,i, B iX; i, . . . , B 



(p-i)x,i} in- 
clude the blocks in Rows X = {2x, Ax, . . . , (p— 1— i)x}. 
And parity groups {B _i, B 2x ,— 1, • • ■ , £(p-3)x -1} in- 
clude the blocks in Rows Y = {ix, (i + 2)x, ... ,(p — 
l)x}. Since 2 < i < (p+l)/2, we have i < (p-l-i)+2, 
and XUY = {2x, Ax,..., (p-l)x}. Hence XUYUZ = 
{1, 2, . . . ,p — 1}. Thus every block in Column 1 + ix 
needs to be sent, for even i. 

Similarly, for Columns 1 + ix, where i is odd 
and 3 < i < (p + l)/2, parity groups 
{B 2x s,B Ax ,i,...,B( p ^i )xA } include the blocks in 
Rows X = {(p - i + 2)x, (p - i + A)x, . . ., (p - l)x}. 
Parity groups {B ^ 1: B 2x .-i, B^ p _^ x _ x } include 
the blocks in Rows Y = {2x, Ax, ...,(« — 3)x}. Since 
2 < i < (p + 1)/ 2, we have i - 3 < p - i + 2, and 
X UY = {2x, Ax,...,{i- 3)x, (p - i + 2)x, (p - i + 
A)x, . . . , (p — l)x}. Therefore, the rows not included in 
X or Y or Z are W = {(i — l)x, (i + l)x, . . . , (p — i)x} 
and \W\ = (p + 3)/2 — i. The total saving in block 
transmissions for all the columns is: 



E 



p- 



-i) 



i£ #. ^ odd 
(P+^-a) , Eleven 



i odd, 3<i<(p+l)/2 k 8 

The above argument can be summarized in the follow- 
ing theorem. 

Theorem 5. "When two systematic nodes are erased in a 
STAR code, there exist a strategy that transmit about 7/8 
of all the information blocks, and about 1/2 of all the 
parity blocks so as to recover one node. 



The repair bandwidth 7 in the above theorem is 
about 7p 2 /8. Comparing it to the lower bound (0), 

2 o 2 

we see a gap of -|- 



Md _ p(p-l)(p+l) 
k(d-k+l) 2p 

in total transmission. 



VI. Conclusions 

We presented an efficient way to repair one lost node 
in EVENODD codes and two lost nodes in STAR codes. 
Our achievable schemes outperform the naive method of 




Fig. 3. The recovery strategy for the first column in STAR code when 
the first and second columns are missing, p = 5, X = 1. 

rebuilding by reconstructing all the data. For EVENODD 
codes, a bandwidth of roughly 3M/4 is sufficient to 
repair an erased systematic node. Moreover, if no linear 
combinations of bits are transmitted, the proposed repair 
method has optimal repair bandwidth with the sole ex- 
ception of the sum of the parity nodes. Since array codes 
only operate on binary symbols, and our repair method 
involves no linear combination of content within a node 
except in the parity nodes, the proposed construction is 
computationally simple and also requires smaller disk 
I/O to read data during repairs. 

There are several open problems on using array codes 
for distributed storage. Although our scheme does not 
achieve the information theoretic cut-set bound, it is not 
clear if that bound is achievable for fixed code structures 
or limited field sizes. If we allow linear combinations 
of bits within each node, the optimal repair remains 
unknown. Our simulations indicate that shortening of 
EVENODD (using less than p columns of information) 
further reduces the repair bandwidth but proper short- 
ening rules and repair methods need to be developed. 
Repairing other families of array codes or Reed-Solomon 
codes would also be of substantial practical interest. 
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